A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-means

نویسندگان

  • Daphne Teck Ching Lai
  • Jonathan M. Garibaldi
  • Daniele Soria
  • Chris M. Roadknight
چکیده

Previously, a semi-manual method was used to identify six novel and clinically useful classes in the Nottingham Tenovus Breast Cancer dataset. 663 out of 1076 patients were classified. The objectives of our work is three folds. Firstly, our primary objective is to use one single automatic method (post-initialisation) to reproduce the six classes for the 663 patients and to classify the remaining 413 patients. Secondly, we explore using semi-supervised fuzzy c-means with various distance metrics and initialisation techniques to achieve this. Thirdly, the clinical characteristics of the 413 patients are examined by comparing with the 663 patients. Our experiments use various amount of labelled data and 10-fold cross validation to reproduce and evaluate the classification. ssFCM with Euclidean distance and initialisation technique by Katsavounidis et al. produced the best results. It is then used to classify the 413 patients. Visual evaluation of the 413 patients’ classifications revealed common characteristics as those previously reported. Examination of clinical characteristics indicates significant associations between classification and clinical parameters. More importantly, association between classification and survival based on the survival curves is shown.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Preliminary Study on Automatic Breast Cancer Data Classification using Semi-supervised Fuzzy c-Means

Soria et al. have successfully identified six clinically useful and novel subgroups in the Nottingham Tenovus Breast Cancer dataset. However, the methodology used is semi-manual and no single clustering can automatically classify the dataset so far. In this work, two variations of semisupervised Fuzzy c-means (ssFCM) algorithms are explored to classify the Nottingham Tenovus Breast Cancer datas...

متن کامل

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor result...

متن کامل

Semi-Supervised Techniques in Breast Cancer Classification A Comparison between Transductive SVM and Semi-Supervised FCM

The Nottingham Tenovus Breast Cancer data has been successfully classified into six novel and clinically useful subgroups. But the existing technique used is semi manual. In this work, we use Transductive Support Vector Machine (TSVM) and semi-supervised Fuzzy c-means (ssFCM) as automatic techniques to classify the dataset and evaluate our results by using 10-fold Cross-Validation technique. A ...

متن کامل

Automatic Prostate Cancer Segmentation Using Kinetic Analysis in Dynamic Contrast-Enhanced MRI

Background: Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) provides functional information on the microcirculation in tissues by analyzing the enhancement kinetics which can be used as biomarkers for prostate lesions detection and characterization.Objective: The purpose of this study is to investigate spatiotemporal patterns of tumors by extracting semi-quantitative as well as w...

متن کامل

An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means

The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CEJOR

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2014